Parameterizable Hardware Architectures for Automatic Synthesis of Motion Estimation Processors
نویسندگان
چکیده
A new class of fully parameterizable multiple array architectures for motion estimation (ME) in video sequences based on the Full Search Block Matching (FSBM) algorithm is proposed in this paper. This class is based on a new and efficient AB2 single array architecture with minimum latency, maximum throughput and full utilization of the hardware resources. It provides the ability to configure the target processors according to the setup parameters, the processing time and the circuit area specified limits. With this purpose, a software configuration tool has been implemented to determine the set of possible configurations which fulfill the requisites of the video coder, providing the ability to automatically generate the VHDL description of the selected configuration. The implementation of a single array processor configuration on a single-chip is presented. Experimental results evidence the ability to estimate motion vectors in real-time with this configuration. INTRODUCTION Video coding systems have been assuming an increasingly important role in several application areas tied in with digital television, videophone and videoconference. Several video compression standards have been established for these different applications [1], exploring both spatial and temporal redundancies of video sequences to achieve the required compression rates. Among these techniques, motion-compensation has proved to be a fundamental technique to improve interframe prediction in video coding. As a consequence of the huge amount of computations required by ME, a great research effort has been made to develop efficient dedicated structures and specialized processors [2]. Due to its regular processing scheme and simple control structures, FSBM algorithms, using the sum of absolute differences (SAD) matching criteria, have been the most used in VLSI implementations, providing optimal estimation results and leading to fast and efficient processing structures. One of the first discussions about FSBM architectures and their classification was presented by Komarek and Pirsch [3]. Komarek presented the characteristics of a set of 1-D and 2-D arrays, obtained by reducing the dimension of the original dependence graph using traditional index projection, time scheduling and graph folding techniques [4]. Their main difference is the explored processing concurrency, implying the usage of different structures and different number of Processor Element (PE)s. Among all these systolic structures, the AB2 2-D architecture proposed by Vos [5, 6] has been regarded as one of the most efficient structures [7]. Its peculiar processing scheme provides it with a short processing time and a limited amount of hardware resources. However, it still has some non-explored features, which can be used to significantly improve its efficiency and parallelism levels. Therefore, this architecture was selected as the basis for the presented research and it will be shown that significant improvements in what concerns its hardware requirements can be obtained, since the amount of memory used to store the search area data can be substantially reduced to achieve a full utilization of the hardware resources. Moreover, the proposed new single array architecture will be extended to multiple array architectures, by exploring the parallelism and lack of data dependencies provided by the ME procedure. A new class of parameterizable array architectures was derived, integrating the proposed single array and multiple array architectures. This class was described using fully parameterizable VHDL code and its functionality was thoroughly tested. An integrated circuit based on the proposed class of architectures has been developed using a standard cell library of a CMOS 0:25 m technology process. Experimental results evidence the possibility to estimate motion vectors in 4CIF video sequences at a rate upto 16 frames/s with this implemented configuration. NEW-AB2 SINGLE ARRAY ARCHITECTURE As it was referred, the proposed new single array architecture, designated by New-AB2, presents some significant improvements in what concerns the utilization of the hardware resources. Due to the similarities between the processing schemes of this architecture and the architecture proposed by Vos, its description will be done by contrasting its optimized characteristics with those presented by Vos and Stegherr [5, 6]. Processor structure The diagram shown in fig. 1 illustrates the main differences between the architecture proposed by Vos, represented with solid and dotted style lines ( ), and the New-AB2 architecture, represented with solid and dot-dashed style lines ( ). Like in other AB2 structures, each pixel of the reference macroblock (MB) is assigned to one of the N2 PEs that compute the SAD similarity function (designated by active PEs). Besides this active block, the processor proposed by Vos is also composed by two passive blocks with 2p N passive PEs, which are appended to each side of the active block (see fig. 1). Each passive PE is composed by running-data registers for the displacement and storage of search area pixels. Both the reference MB and the search area pixels are transfered into the processor through two vertical input
منابع مشابه
Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures
A new class of fully parameterizable multiple array architectures for motion estimation in video sequences based on the Full-Search Block-Matching algorithm is proposed in this paper. This class is based on a new and efficient AB2 single array architecture with minimum latency, maximum throughput and full utilization of the hardware resources. It provides the ability to configure the target pro...
متن کاملMotion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding
[Article] Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding VLSI co-processors for real-time high-quality H.264/AVC video coding. Porto, the institutional repository of the Politecnico di Torino, is provided by the University Library and the IT-Services. The aim is to enable open access to all the world. Please share with us how this access benefit...
متن کاملA novel all-binary motion estimation (ABME) with optimized hardware architectures
We present a fast motion estimation algorithm using only binary representation, which is desirable for both embedded system and hardware implementation with parallel architectures. The key algorithm distinction is that only the high-frequency spectrum is used. Our experimental results show that it provides excellent performance at both low and high bit rates. Because of its binary-only represen...
متن کاملCustomisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs
This paper proposes new core-based architectures for motion estimation that are customisable for different coding parameters and hardware resources. These new cores are derived from an efficient and fully parameterisable 2-D single array systolic structure for full-search block-matching motion estimation and inherit its configurability properties in what concerns the macroblock dimension, the s...
متن کاملHardware-Friendly Motion Estimation Algorithms and its Architectures for High Definition Videos
This paper presents details about three hardwarefriendly motion estimation algorithms focused on high quality to high definition videos. The Dynamic Multi-Point Diamond Search (DMPDS), Spread and Iterative Search (S&IS) and Low Density and Iterative Search (LD&IS) are fast motion estimation algorithms focused on hardware implementation. These algorithms were evaluated for ten high definition se...
متن کامل